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COVID-19 confirmed cases and deaths. A rapid tool called the first digit law 
COVID-19 . or the fulfillment of Benford’s law was used to suggest good data quality for 
Epidemiological epidemiological surveillance. Data analysis used the Chi-squared test and the 
First-digit law log-likelihood ratio test. Also, it displayed the difference in mean absolute 
SARS-CoV-2 deviation (MAD) to identify the proximity of the data and Benford’s law 
Surveillance distribution. The results showed that both confirmed, and death case 
distributions were statistically non-conformity with Benford’s law 
distribution. In terms of quality data regarding the COVID-19 pandemic, the 
epidemiological surveillance system falls short of Benford’s law assumption. 
Benford's law has been acknowledged as an initial analysis that can 
expeditiously assess the performance of a surveillance system. The next phase 
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in post-pandemic COVID-19. 
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1. INTRODUCTION 

The transmission of coronavirus disease (COVID-19) impacted many countries, including Indonesia. 
Indonesia has experienced a rise in the number of cases over a short period. As an infectious disease, 
COVID-19 transmission happens via droplets and contact with the virus. After that, through the open mucosa, 
the virus can enter [1]. An outbreak of pneumonia of obscure origin started in December 2019 and was reported 
in China, at Wuhan, Hubei Province. Furthermore, the global spread of severe acute respiratory syndrome 
coronavirus 2 (SARS-CoV-2) was reported by the World Health Organization (WHO) on March 12, 2020, and 
because of COVID-19, thousands of deaths occurred [2]. 

Based on the national trend, COVID-19 in Indonesia still shows occurrence in 2023, with the total 
number of cases in May of 6,798,097 confirmed cases and 161,630 deaths. In Indonesia, Jakarta is the region with 
the highest cases of the spread of COVID-19, followed by Central Java and then East Java [3]. In March 2020, 
according to the study and investigation of responses to COVID-19 in Indonesia, surveillance and epidemiological 
analyses were emphasized to comprehend the scope of the COVID-19 condition in the Indonesia [4]. 


Journal homepage: http://ijphs.iaescore.com 


8 m) ISSN: 2252-8806 


Surveillance systems have a fundamental function in controlling the pandemic of COVID-19 [5]. In 
practice, the surveillance system experiences challenges such as different data collection platforms, poor 
interoperability, data duplication problems, data integration, data completeness, and data analysis which 
ultimately have an impact on countermeasures responses [6]. A complete evaluation of a surveillance system 
should be assessed including the features of the surveillance characteristics of flexibility, sensitivity, simplicity, 
acceptability, representativeness, stability, timeliness, and positive predictive value (PPV) [5], [7]. 
Epidemiological surveillance systems are commonly evaluated after an epidemic. It has occurred due to a need 
for rapid evaluation techniques to specify whether cases meet expectations during an occurrence [8], [9]. 
Finally, the method of Benford’s law, known as the law of first digits, Newcomb-Benford’s law, or the law of 
anomaly numbers, was utilized to evaluate the quality data of the surveillance system, especially in public 
health surveillance [10]. 

In mathematics, there are methods to determine the authenticity of data. One of these methods is based 
on the frequency of occurrence of the first digit, which is called Benford’s law. Frank Benford, a physicist, 
1,938 found that number one appears in the first digit of random data more often than number two, number 
two more often than number three, and so on. The frequency of occurrence of a number will decrease as the 
number in the first digit increases [11]-[13]. 

In more detail, Benford’s law can estimate the frequency of occurrence of a number in a series of 
numerical data. If the numerical data is generated without intention, then the number's occurrence frequency 
will be by the expected frequency in Benford’s law. Conversely, suppose there is an element of human 
intentionality to create and include a number combination in a data set. In that case, the Benford’s law analysis 
results will show that specific numbers appear more or less than expected. Benford’s law is widely used in 
various fields because it detects anomalous data in a data set. If further explained, such data anomalies can help 
detect fraud [14], [15]. 

For the field of epidemic control, a reliable epidemiological surveillance system is essential. Providing 
high-quality data so that decisions can be made using evidence is one of its responsibilities [16]. In this study, 
Benford’s law analysis was used to evaluate the first significant digit distribution of daily confirmed cases and 
COVID-19 related deaths in Indonesia. 


2. METHOD 

A quantitative method was used in this research. A quantitative study is a systematic scientific analysis 
of the components and phenomena and the cause of their associations [17]. An method based on Benford’s law 
was presented in this work [18]. In order to assess the data quality for an epidemiological monitoring system, 
Benford's distribution of confirmed cases and deaths of COVID-19 was examined. This research used 
COVID-19 epidemiological surveillance data. The collected data were obtained from WHO 
website https://covid19.who.int/ by using data daily reports on confirmed cases and death caused by 
COVID-19 and covering all subjects in Indonesia [19]. The cases included in this study were from March 2020 
to January 2021. The period used adjusts before mass vaccination interventions are carried out in Indonesia 
[20]. COVID-19 epidemiological surveillance data in Indonesia will be evaluated for data quality by analyzing 
the confirmed cases and deaths and conformance to Benford’s distribution. Data analysis used visual analysis 
through pictures and statistical analysis using the Chi-squared test and the log-likelihood ratio, on top of that, 
displayed diff mean absolute deviation (MAD). Data analysis used statistical software STATA 17. 

A mathematical phenomenon called Benford’s law, commonly referred to as the first-digit law shows 
how leading digits are distributed across numerous real-world datasets. This analysis uses physics-related 
assumptions regarding the distribution of naturally occurring data. Using integral calculus, Benford calculates 
how often a digit or a combination of digits will likely appear. In the law regarding the probability of occurrence 
of numbers, Benford formulates the expectation of the appearance of a series of numbers as the first digit by 
assuming that the occurrence will follow a logarithmic distribution, with a pattern that can be summed up in 
the following formula [11]: 


P(d) = log [1+(1/d)] 
P(d) = probability of a digit will be the leading number 
d= 1,2. 056. ,9, series of numbers 


According to Benford’s law, data whose initial number is one (30.103%) appears more often than data 
that begins with another number, following the order from 2 to 9 (respectively 17.609%; 12.494%; 9.691%; 
7.918%; 6.695%; 5.799%; 5.115%; and 4.576%). In simpler terms, this means that smaller digits 
(1, 2, 3) have a higher probability of being the leading digit, while larger digits (7, 8, 9) have a lower probability 
[21]. Benford’s law was utilized in this study to analyze the conformity of the epidemiological data by using 
the formula and replacing (d) with COVID-19 daily cases and death. 
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3. RESULTS AND DISCUSSION 

The quality of epidemiological surveillance system data can be assessed quickly using Benford’s law 
analysis. As several studies demonstrated throughout the avian influenza/H1N1 pandemic (AI) and the 
epidemic dengue incidence in Paraguay assessed surveillance systems performance in each country using 
Benford’s law [22], [23]. Surveillance systems work well if the data and distribution follow the first-digit 
distribution. Benfod's Law said the digit that appears most frequently in surveillance reports is the first digit. 
It supposes the first digit (30.103%), then pursued by the different numbers from digits two to nine. 
(respectively, 17.609%, 12.494%, 9.691%, 7.918%, 6.695%, 5.799 %, 5.115%, 4.576%) [21]. 

Based on the undertaking of the International Health Regulation, it is known that there are challenges 
faced by developing countries in terms of disease control and response. One of them is that governments are 
aware of gaps in disease surveillance capacity which will impact the ability to monitor and respond to disease. 
The pandemic of COVID-19 has demonstrated a failure to respond to the emergence of approvingly infectious 
and deadly microbes [24], [25]. It is necessary to strengthen the existing health system and build an adequate 
surveillance system to prevent the next pandemic [26], [27]. 

In this study, the first step is to evaluate the quality of surveillance data using Benford’s law by 
evaluating confirmed cases and COVID-19 deaths, whether following the algorithm (distribution) or not. The 
following is an overview of confirmed cases and death cases started from March 2020 to January 2021. 
Observations ended on January 12, 2021, before the intervention, the mass vaccination of COVID-19 in 
Indonesia. Figure 1 illustrates that confirmed cases and deaths in Indonesia from March 2020 to January 2021 
are based on daily cases. COVID-19 showed an increasing trend. However, there was a decrease at a certain 
point, but it continued to increase both confirmation cases and deaths. 
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Figure 1. Daily occurrence of confirmed cases and deaths of COVID-19 from March 2020 to January 2021 


Evaluation of surveillance data of COVID-19 uses the first digit test as shown in Tables 1 and 2, 
which shows whether the first digit of each number in the observed distribution of numbers has data anomalies 
or otherwise conforms to the expected distribution. It can be proved by calculating the goodness of fit test and 
using the p-value as significance. The p-value was obtained through the Chi-square test and the likelihood 
ratio. Besides that, it also calculates the MAD value to see how far the data match Benford’s law distribution. 
For this test using the statistical hypothesis Ho which is the two distributions (observation cases and Benford’s 
law distribution) are the same. It means that the distribution follows that predicted by Benford’s law. In 
contrast, H; is the two different distributions. Hence, the bigger the p-value, the higher the confidence to accept 
HO that the observed distribution is following the expectations of Benford’s law theory. In addition, 
visualization in graphs is also presented, showing the observed data distribution compared with Benford’s law 
distribution. Figures 2 and 3 give a summary of the distribution of the daily number of confirmed cases and 
deaths caused by COVID-19 and compare it to the distribution determined by Benford's algorithm. Tables 1 
and 2 display the results of the first-digit test analysis for compliance with Benford’s law. 
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Table 1. Table of first-digit distribution of confirmed cases and tests of significance 


' 2 
Digit Count Observed Expected (Benford) Diff. MAD p-value ped Log likelihood ratio (p-value) 
1 77 25.581 30.103 -4.522 0.0900 0.0000 0.0000 
2 37 12.292 17.609 -5.317 0.0152 
3 56 18.605 12.494 6.111 0.0022 
4 54 17.940 9.691 8.249 0.0000 
5 19 6.312 7.918 -1.606 0.3376 
6 30 9.967 6.695 3.272 0.0281 
7 10 3.322 5.799 -2.4717 0.0644 
8 11 3.654 5.115 -1.461 0.2952 
9 7 2.326 4.576 -2.250 0.0708 
Total 301 100.000 100.000 3.918 
Table 2. Table of first-digit distribution of death cases and tests of significance 
' 2 
Digit Count Observed Expected (Benford) Diff. (MAD) p-value piers Log likelihood ratio (p-value) 
1 116 38.538 30.103 8.435 0.0020 0.0000 0.0000 
2 32 10.631 17.609 -6.978 0.0011 
3 20 6.645 12.494 -5.849 0.0012 
4 19 6.312 9.691 -3.379 0.0506 
5 20 6.645 7.918 -1.274 0.5208 
6 20 6.645 6.695 -0.050 1.0000 
7; 25 8.306 5.799 2.506 0.0823 
8 31 10.299 5.115 5.184 0.0003 
9 18 5.980 4.576 1.404 0.2668 
Total 301 100.000 100.000 3.895 
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Figure 2 illustrates the COVID-19 confirmed case distribution to the predicted Benford’s law 
distribution using the first digit distribution. The graph of the daily reports of confirmed cases of COVID-19 
shows that the first digits of the numbers two, three, four, and six do not follow the predicted distribution according 
to Benford’s law algorithm and confidence intervals. Furthermore, Figure 3 illustrates the death case of COVID- 
19 distribution to the Benford’s law expected. The first digits one, two, three, seven, and eight do not follow the 
expected distribution seen from Benford’s frequency distribution and confidence intervals. 

In the analysis of fulfillment with Benford’s law, both the confirmed case and death case variables 
show the result of rejecting the null hypothesis. The likelihood ratio and the Chi-square statistical test results 
in Tables 1 and 2 have a p-value of 0.000 (p-value<0.05), indicating that the alternative hypothesis is accepted. 
It denotes a statistical difference between the observed and expected distributions (Benford distribution). In 
addition, the analysis using MAD was performed to determine whether the observed data and the Benford 
distribution were similar. The outcome indicated that the two variables, confirmed cases and deaths brought 
on by COVID-19, both display a MAD value of >0.015, suggesting that they do not follow the Benford 
distribution (3.918 and 3.895, respectively). 

The results of this analysis were likewise found to be the same as an investigation performed in India. 
The data is from daily data on COVID-19 patients and dyings in India and Kerala. Based on the analysis, the 
distribution of COVID-19 cases complies with the first digit of Benford’s law. Still, for death reports, India’s 
national data and the state of Kerala do not match Benford’s law distribution. It is shown through The MAD 
value of COVID-19 deaths for India’s national data (0.0171) and the MAD value for the state of Kerala 
(0.0415), which means it does not match the Benford distribution [28]. These results are also in line with 
research conducted by Kilani and Georgiou, showing that many COVID-19 report data were found to be 
inconsistent with the distribution of Benford’s law, especially in developing countries [29]. 

Previously were some examples of studies conducted in developing countries. In developed countries 
like America, the same thing has also been found, and it is possible that there is unreported data related to 
COVID-19. The findings of this investigation, which are based on a study done in the United States (US), 
showed that COVID-19 deaths were not reported in several US states based on an analysis using Benford’s 
law to judge the accuracy of epidemiological data. The testing procedure determines the degree of compliance 
with Benford’s law. Still, the studies with the most substantial and most apparent evidence using the MAD 
criteria show evidence of deaths from COVID-19 that were not reported in the US [30]. However, it is slightly 
different from studies conducted in China. Based on Benford’s law, the results show that there is no proof of 
the unconformity of COVID-19 data in the China [31]. 

When Benford’s law is not fulfilled, there are several potential scenarios. Suppose the observed 
distribution of death data does not meet Benford’s law, but the number is greater than the average death rate. In 
that case, the response to the pandemic was likely inadequate. When Benford’s law is not fulfilled, mortality rates 
are descending, potential reasons are insufficient scope or coverage, or the country is in the early phase of an 
epidemic/pandemic. Inefficient surveillance systems are indicated by a shortage of diagnostic tests, restricted 
scope, or existing in the early stages of an epidemic where the Benford distribution has yet to be observed. 
However, after an epidemic or pandemic ends, complementary studies can be carried out related to evaluating a 
surveillance system that is more stringent than other attributes of a surveillance system [23], [32]. 

Although the results of Benford’s law fulfillment analysis concluded that the data on confirmed cases 
and deaths caused by COVID-19 was not the same as the expected distribution, this did not mean that these 
results were able to complete as a whole regarding the evaluation of the surveillance system. As a preliminary 
study that can quickly analyze the implementation of a surveillance system, Benford’s law has been accepted 
and recognized in several studies [32]—[34]. A surveillance system's overall evaluation should consider its 
attributes, such as its adaptability, sensitivity, acceptance, simplicity, representativeness, stability, timeliness, 
and PPV but are commonly evaluated after an epidemic. Benford’s law was an effective method for evaluating 
the accuracy or reliability of the data produced by various nations or even distinct regions within one nation 
[16]. The authors tested the reliability of COVID-19 death-case reporting in nations with authoritarian regimes 
using Benford’s law, and their results are presented. They concluded that when it comes to the reported 
COVID-19 death-case numbers, democratically run nations adhere to Benford’s law more closely than 
authoritarian ones [35]. Since Benford’s law can be utilized to test data set reliability, a study in another field 
study used Benford’s to assess the quality of widely employed survey data groups. The result concludes that 
nearly all the data groups indicate substantial dissimilarities, significantly suggesting reliability issues in the 
survey data [36]. 

However, what needs to be noted is that this evaluation only includes some elements of the 
surveillance system involved in managing the COVID-19 pandemic. Further, a complete evaluation of a 
surveillance system is required, which consists of the attributes of the surveillance system [37]. A quick 
evaluation using Benford’s law provides feedback to relevant stakeholders in Indonesia. Ongoing evaluation 
allows responsible governments to create appropriate determinations to enhance epidemiological surveillance 
systems. 
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4. CONCLUSION 

In fulfillment with Benford’s law, the confirmed case and death caused by COVID-19 show the result 
of non-conformity to Benford distribution. The conclusion has been drawn based on the Chi-square statistical 
test and the likelihood ratio. The next stage of this study would be to conduct a complete evaluation of a 
surveillance system that includes the features of the system characteristics that are undertaken suitably, 
especially in the post-pandemic of COVID-19. 
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